The first two chunks of this r markdown file after the r setup allow for plot zooming, but it also means that the html file must be opened in a browser to view the document properly. When it knits in RStudio the preview will appear empty but the html when opened in a browser will have all the info and you can click on each plot to Zoom in on it.
A few notes about this script.
If you are running this with the 2022-2023 data make sure you download the whole (OSM_2022-2023 GitHub repository)[https://github.com/ACMElabUvic/OSM_2022-2023] from the ACMElabUvic GitHub. This will ensure you have all the files, data, and proper folder structure you will need to run this code and associated analyses.
Also make sure you open RStudio through the R project (OSM_2022-2023.Rproj) this will automatically set your working directory to the correct place (wherever you saved the repository) and ensure you don’t have to change the file paths for some of the data.
Lastly, if you are looking to adapt this code for a future year of data, you will want to ensure you have run the ACME_camera_script_9-2-2024.R or .Rmd with your data as there is much data formatting, cleaning, and restructuring that has to be done before this code will work.
If you have question please email the most recent author, currently
Marissa A. Dyck
Postdoctoral research fellow
University of Victoria
School of Environmental Studies
Email: marissadyck17@gmail.com
(update/add authors as needed)
If you don’t already have the following packages installed, use the code below to install them.
install.packages('tidyverse')
install.packages('PerformanceAnalytics')
install.packages('Hmisc')
Then load the packages to your library.
library(tidyverse) # data tidying, visualization, and much more; this will load all tidyverse packages, can see complete list using tidyverse_packages()
library(PerformanceAnalytics) #Used to generate a correlation plot
library(Hmisc) # used to generate histograms for all variables in data frame
To do any analysis with the detection data from the OSM arrays, we will want to pair it with the covaraite data which has human factors indices (HFI) and landcover data (VEG) for each site. There are a lot of covaraites/features in these datasets that need to be grouped together to be usable, which is what this script covers.
Let’s read in the covariate data (outputs from the ACME_camera_script_9-2-2024.Rmd)
# model covariates (merged HFI and VEG data from the ACME_camera_script_9-2-2024.R or .Rmd)
covariates <- read_csv('data/processed/OSM_2022_covariates.csv',
# set the column types to read in correctly
col_types = cols(array = col_factor(),
camera = col_factor(),
site = col_factor(),
buff_dist = col_factor(),
.default = col_number()))
# check variable structure
str(covariates)
## spc_tbl_ [3,100 Ă— 119] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ array : Factor w/ 4 levels "LU13","LU15",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ camera : Factor w/ 96 levels "18","15","03",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ site : Factor w/ 155 levels "LU13_18","LU13_15",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ buff_dist : Factor w/ 20 levels "250","500","750",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ vegetated_edge_roads : num [1:3100] 0 0.0858 0 0 0 ...
## $ harvest_area : num [1:3100] 0 0 0.687 0.337 0 ...
## $ road_gravel_1l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ conventional_seismic : num [1:3100] 0 0.03276 0 0.00889 0.01145 ...
## $ tame_pasture : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ pipeline : num [1:3100] 0 0.068 0 0 0.0301 ...
## $ road_gravel_2l : num [1:3100] 0 0 0 0 0 ...
## $ trail : num [1:3100] 0.00588 0.0028 0 0.00196 0 ...
## $ well_bitumen : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ rough_pasture : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_aband : num [1:3100] 0 0 0 0 0.0322 ...
## $ road_unclassified : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ crop : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ low_impact_seismic : num [1:3100] 0 0 0 0 0.0523 ...
## $ clearing_unknown : num [1:3100] 0.0923 0.0697 0 0 0 ...
## $ cultivation_abandoned : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_undiv_2l : num [1:3100] 0 0.0174 0 0 0 ...
## $ road_unimproved : num [1:3100] 0 0 0 0 0 ...
## $ truck_trail : num [1:3100] 0 0 0 0.0139 0 ...
## $ dugout : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_undiv_1l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_gas : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ vegetated_edge_railways : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ harvest_area_white_zone : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ country_residence : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ borrowpit_dry : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ rural_residence : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ borrowpit_wet : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ borrowpits : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ grvl_sand_pit : num [1:3100] 0 0.0873 0 0 0 ...
## $ ris_reclaimed_temp : num [1:3100] 0 0.0477 0 0 0 ...
## $ ris_clearing_unknown : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_drainage : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_mines_oilsands : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_overburden_dump : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_facility_operations : num [1:3100] 0 0 0 0 0 ...
## $ transmission_line : num [1:3100] 0.0642 0 0 0 0.091 ...
## $ ris_tailing_pond : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ clearing_wellpad_unconfirmed: num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ mines_oilsands : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_soil_replaced : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_1l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_oilsands_rms : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_facility_unknown : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_borrowpits : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_transmission_line : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_soil_salvaged : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_road : num [1:3100] 0 0 0 0 0 ...
## $ ris_plant : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ urban_residence : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ facility_other : num [1:3100] 0 0 0 0 0 ...
## $ airp_runway : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ runway : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_reclaimed_permanent : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ urban_industrial : num [1:3100] 0.291 0 0 0 0 ...
## $ lagoon : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ facility_unknown : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ residence_clearing : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_cased : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_unpaved_2l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_3l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ surrounding_veg : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ rlwy_sgl_track : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_winter : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ sump : num [1:3100] 0 0 0 0 0 ...
## $ greenspace : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_2l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_other : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ canal : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ reservoir : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_cleared_not_confirmed : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ misc_oil_gas_facility : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ camp_industrial : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_camp_industrial : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ oil_gas_plant : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_unknown : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_utilities : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ cfo : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ recreation : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ campground : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ peat : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ golfcourse : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ landfill : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ transfer_station : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ mill : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_div : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ rlwy_spur : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_cleared_not_drilled : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ open_pit_mine : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_oil : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_4l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ mines_pitlake : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_reclaimed_certified : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_windrow : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ tailing_pond : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## [list output truncated]
## - attr(*, "spec")=
## .. cols(
## .. .default = col_number(),
## .. array = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. camera = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. buff_dist = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. vegetated_edge_roads = col_number(),
## .. harvest_area = col_number(),
## .. road_gravel_1l = col_number(),
## .. conventional_seismic = col_number(),
## .. tame_pasture = col_number(),
## .. pipeline = col_number(),
## .. road_gravel_2l = col_number(),
## .. trail = col_number(),
## .. well_bitumen = col_number(),
## .. rough_pasture = col_number(),
## .. well_aband = col_number(),
## .. road_unclassified = col_number(),
## .. crop = col_number(),
## .. low_impact_seismic = col_number(),
## .. clearing_unknown = col_number(),
## .. cultivation_abandoned = col_number(),
## .. road_paved_undiv_2l = col_number(),
## .. road_unimproved = col_number(),
## .. truck_trail = col_number(),
## .. dugout = col_number(),
## .. road_paved_undiv_1l = col_number(),
## .. well_gas = col_number(),
## .. vegetated_edge_railways = col_number(),
## .. harvest_area_white_zone = col_number(),
## .. country_residence = col_number(),
## .. borrowpit_dry = col_number(),
## .. rural_residence = col_number(),
## .. borrowpit_wet = col_number(),
## .. borrowpits = col_number(),
## .. grvl_sand_pit = col_number(),
## .. ris_reclaimed_temp = col_number(),
## .. ris_clearing_unknown = col_number(),
## .. ris_drainage = col_number(),
## .. ris_mines_oilsands = col_number(),
## .. ris_overburden_dump = col_number(),
## .. ris_facility_operations = col_number(),
## .. transmission_line = col_number(),
## .. ris_tailing_pond = col_number(),
## .. clearing_wellpad_unconfirmed = col_number(),
## .. mines_oilsands = col_number(),
## .. ris_soil_replaced = col_number(),
## .. road_paved_1l = col_number(),
## .. ris_oilsands_rms = col_number(),
## .. ris_facility_unknown = col_number(),
## .. ris_borrowpits = col_number(),
## .. ris_transmission_line = col_number(),
## .. ris_soil_salvaged = col_number(),
## .. ris_road = col_number(),
## .. ris_plant = col_number(),
## .. urban_residence = col_number(),
## .. facility_other = col_number(),
## .. airp_runway = col_number(),
## .. runway = col_number(),
## .. ris_reclaimed_permanent = col_number(),
## .. urban_industrial = col_number(),
## .. lagoon = col_number(),
## .. facility_unknown = col_number(),
## .. residence_clearing = col_number(),
## .. well_cased = col_number(),
## .. road_unpaved_2l = col_number(),
## .. road_paved_3l = col_number(),
## .. surrounding_veg = col_number(),
## .. rlwy_sgl_track = col_number(),
## .. road_winter = col_number(),
## .. sump = col_number(),
## .. greenspace = col_number(),
## .. road_paved_2l = col_number(),
## .. well_other = col_number(),
## .. canal = col_number(),
## .. reservoir = col_number(),
## .. well_cleared_not_confirmed = col_number(),
## .. misc_oil_gas_facility = col_number(),
## .. camp_industrial = col_number(),
## .. ris_camp_industrial = col_number(),
## .. oil_gas_plant = col_number(),
## .. well_unknown = col_number(),
## .. ris_utilities = col_number(),
## .. cfo = col_number(),
## .. recreation = col_number(),
## .. campground = col_number(),
## .. peat = col_number(),
## .. golfcourse = col_number(),
## .. landfill = col_number(),
## .. transfer_station = col_number(),
## .. mill = col_number(),
## .. road_paved_div = col_number(),
## .. rlwy_spur = col_number(),
## .. well_cleared_not_drilled = col_number(),
## .. open_pit_mine = col_number(),
## .. well_oil = col_number(),
## .. road_paved_4l = col_number(),
## .. mines_pitlake = col_number(),
## .. ris_reclaimed_certified = col_number(),
## .. ris_windrow = col_number(),
## .. tailing_pond = col_number(),
## .. rlwy_mlt_track = col_number(),
## .. rlwy_dbl_track = col_number(),
## .. ris_waste = col_number(),
## .. interchange_ramp = col_number(),
## .. road_paved_5l = col_number(),
## .. ris_airp_runway = col_number(),
## .. fruit_vegetables = col_number(),
## .. road_unpaved_1l = col_number(),
## .. ris_reclaim_ready = col_number(),
## .. ris_tank_farm = col_number(),
## .. lc_class20 = col_number(),
## .. lc_class32 = col_number(),
## .. lc_class33 = col_number(),
## .. lc_class34 = col_number(),
## .. lc_class50 = col_number(),
## .. lc_class110 = col_number(),
## .. lc_class120 = col_number(),
## .. lc_class210 = col_number(),
## .. lc_class220 = col_number(),
## .. lc_class230 = col_number()
## .. )
## - attr(*, "problems")=<externalptr>
There are too many covariates to include in the models individually and many of them describe similar HFI features.
Now that this section is finalized, we will use the structure outlined in the covariates_table.docx which can be found in the ‘relevant_literature’ folder of this repository for formatting the covariates for this and future related analyses. However, the code below outlines a process to explore the data which led to some of the decisions in the covariates_table.docx in case someone wants to group the data in a different way they have code to explore it
The covariate_table and the README file in this repository include descriptions of each feature from the ABMI human footprints wall to wall data download website for Year 2021; which can also be found in the relevant_literature folder of this repository (HFI_2021_v1_0_Metadata_Final.pdf).
First lets order the columns alphabetically so we can look at descriptions for everything in the ABMI doc easier. We will want the non-covariate columns (i.e., array, site, camera, buffer_dsit) at the front so we can use relocate after we order all of the columns to move these four to the front of the data.
covariates <- covariates %>%
# order columns alphabetically
select(order(colnames(.))) %>%
# we want to move the columns that aren't HFI features or landcover to the front
relocate(.,
c(array,
site,
camera,
buff_dist))
# get a list of column names to ensure it worked
names(covariates)
## [1] "array" "site"
## [3] "camera" "buff_dist"
## [5] "airp_runway" "borrowpit_dry"
## [7] "borrowpit_wet" "borrowpits"
## [9] "camp_industrial" "campground"
## [11] "canal" "cfo"
## [13] "clearing_unknown" "clearing_wellpad_unconfirmed"
## [15] "conventional_seismic" "country_residence"
## [17] "crop" "cultivation_abandoned"
## [19] "dugout" "facility_other"
## [21] "facility_unknown" "fruit_vegetables"
## [23] "golfcourse" "greenspace"
## [25] "grvl_sand_pit" "harvest_area"
## [27] "harvest_area_white_zone" "interchange_ramp"
## [29] "lagoon" "landfill"
## [31] "lc_class110" "lc_class120"
## [33] "lc_class20" "lc_class210"
## [35] "lc_class220" "lc_class230"
## [37] "lc_class32" "lc_class33"
## [39] "lc_class34" "lc_class50"
## [41] "low_impact_seismic" "mill"
## [43] "mines_oilsands" "mines_pitlake"
## [45] "misc_oil_gas_facility" "oil_gas_plant"
## [47] "open_pit_mine" "peat"
## [49] "pipeline" "recreation"
## [51] "reservoir" "residence_clearing"
## [53] "ris_airp_runway" "ris_borrowpits"
## [55] "ris_camp_industrial" "ris_clearing_unknown"
## [57] "ris_drainage" "ris_facility_operations"
## [59] "ris_facility_unknown" "ris_mines_oilsands"
## [61] "ris_oilsands_rms" "ris_overburden_dump"
## [63] "ris_plant" "ris_reclaim_ready"
## [65] "ris_reclaimed_certified" "ris_reclaimed_permanent"
## [67] "ris_reclaimed_temp" "ris_road"
## [69] "ris_soil_replaced" "ris_soil_salvaged"
## [71] "ris_tailing_pond" "ris_tank_farm"
## [73] "ris_transmission_line" "ris_utilities"
## [75] "ris_waste" "ris_windrow"
## [77] "rlwy_dbl_track" "rlwy_mlt_track"
## [79] "rlwy_sgl_track" "rlwy_spur"
## [81] "road_gravel_1l" "road_gravel_2l"
## [83] "road_paved_1l" "road_paved_2l"
## [85] "road_paved_3l" "road_paved_4l"
## [87] "road_paved_5l" "road_paved_div"
## [89] "road_paved_undiv_1l" "road_paved_undiv_2l"
## [91] "road_unclassified" "road_unimproved"
## [93] "road_unpaved_1l" "road_unpaved_2l"
## [95] "road_winter" "rough_pasture"
## [97] "runway" "rural_residence"
## [99] "sump" "surrounding_veg"
## [101] "tailing_pond" "tame_pasture"
## [103] "trail" "transfer_station"
## [105] "transmission_line" "truck_trail"
## [107] "urban_industrial" "urban_residence"
## [109] "vegetated_edge_railways" "vegetated_edge_roads"
## [111] "well_aband" "well_bitumen"
## [113] "well_cased" "well_cleared_not_confirmed"
## [115] "well_cleared_not_drilled" "well_gas"
## [117] "well_oil" "well_other"
## [119] "well_unknown"
Let’s get a summary of each variable now, and lets filter by just the 1000m buffer width so we don’t have a bunch of repeated data for each buffer width at each site, this will give us general insights into how much variability we have with each feature at a general buffer width. *You can change this if you are interested in a different bufffer width specifically, or if it makes more since to see the data for the min (250m) or max (5000m) buffer width.
covariates %>%
# filter to just buffer 1000 m
filter(buff_dist == 1000) %>%
summary(.)
## array site camera buff_dist airp_runway
## LU13:41 LU13_18: 1 27 : 4 1000 :155 Min. :0
## LU15:39 LU13_15: 1 32 : 4 250 : 0 1st Qu.:0
## LU21:36 LU13_03: 1 41 : 4 500 : 0 Median :0
## LU01:39 LU13_34: 1 36 : 4 750 : 0 Mean :0
## LU13_57: 1 16 : 3 1250 : 0 3rd Qu.:0
## LU13_16: 1 21 : 3 1500 : 0 Max. :0
## (Other):149 (Other):133 (Other): 0
## borrowpit_dry borrowpit_wet borrowpits
## Min. :0.0000000 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0.0000000 Median :0.0000000 Median :0.0000000
## Mean :0.0009388 Mean :0.0006446 Mean :0.0002542
## 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.0000000
## Max. :0.0300351 Max. :0.0198622 Max. :0.0072821
##
## camp_industrial campground canal cfo clearing_unknown
## Min. :0.0000000 Min. :0 Min. :0 Min. :0 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0.000000
## Median :0.0000000 Median :0 Median :0 Median :0 Median :0.000000
## Mean :0.0003785 Mean :0 Mean :0 Mean :0 Mean :0.006422
## 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.001654
## Max. :0.0160772 Max. :0 Max. :0 Max. :0 Max. :0.182912
##
## clearing_wellpad_unconfirmed conventional_seismic country_residence
## Min. :0.0000000 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.0000000 1st Qu.:0.002498 1st Qu.:0.0000000
## Median :0.0000000 Median :0.005202 Median :0.0000000
## Mean :0.0004428 Mean :0.006143 Mean :0.0000828
## 3rd Qu.:0.0000000 3rd Qu.:0.009577 3rd Qu.:0.0000000
## Max. :0.0117571 Max. :0.020381 Max. :0.0128340
##
## crop cultivation_abandoned dugout facility_other
## Min. :0 Min. :0.000e+00 Min. :0 Min. :0.000000
## 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0.000000
## Median :0 Median :0.000e+00 Median :0 Median :0.000000
## Mean :0 Mean :5.408e-05 Mean :0 Mean :0.001119
## 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0.000000
## Max. :0 Max. :8.383e-03 Max. :0 Max. :0.062266
##
## facility_unknown fruit_vegetables golfcourse greenspace
## Min. :0.000e+00 Min. :0 Min. :0 Min. :0
## 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0.000e+00 Median :0 Median :0 Median :0
## Mean :5.746e-05 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :3.281e-03 Max. :0 Max. :0 Max. :0
##
## grvl_sand_pit harvest_area harvest_area_white_zone interchange_ramp
## Min. :0.000000 Min. :0.00000 Min. :0 Min. :0
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0 1st Qu.:0
## Median :0.000000 Median :0.00000 Median :0 Median :0
## Mean :0.003109 Mean :0.02293 Mean :0 Mean :0
## 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0 3rd Qu.:0
## Max. :0.109732 Max. :0.42899 Max. :0 Max. :0
##
## lagoon landfill lc_class110 lc_class120
## Min. :0.0000000 Min. :0 Min. :0.000000 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.004449 1st Qu.:0.000e+00
## Median :0.0000000 Median :0 Median :0.046414 Median :0.000e+00
## Mean :0.0002406 Mean :0 Mean :0.054946 Mean :3.878e-06
## 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0.082004 3rd Qu.:0.000e+00
## Max. :0.0126573 Max. :0 Max. :0.231159 Max. :6.011e-04
##
## lc_class20 lc_class210 lc_class220 lc_class230
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.4659 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.7228 Median :0.01425 Median :0.02315
## Mean :0.02123 Mean :0.6400 Mean :0.10735 Mean :0.06363
## 3rd Qu.:0.00000 3rd Qu.:0.8433 3rd Qu.:0.16066 3rd Qu.:0.08982
## Max. :0.38025 Max. :0.9858 Max. :0.84274 Max. :0.47473
##
## lc_class32 lc_class33 lc_class34 lc_class50
## Min. :0 Min. :0.000000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.01205
## Median :0 Median :0.000000 Median :0.00000 Median :0.03874
## Mean :0 Mean :0.004366 Mean :0.03862 Mean :0.06991
## 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.:0.05870 3rd Qu.:0.09848
## Max. :0 Max. :0.243332 Max. :0.25234 Max. :0.55986
##
## low_impact_seismic mill mines_oilsands mines_pitlake
## Min. :0.000000 Min. :0 Min. :0 Min. :0
## 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0.000000 Median :0 Median :0 Median :0
## Mean :0.004172 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0.000063 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0.060391 Max. :0 Max. :0 Max. :0
##
## misc_oil_gas_facility oil_gas_plant open_pit_mine peat
## Min. :0.000000 Min. :0.000000 Min. :0.0000000 Min. :0
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0
## Median :0.000000 Median :0.000000 Median :0.0000000 Median :0
## Mean :0.002912 Mean :0.001167 Mean :0.0005665 Mean :0
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.0000000 3rd Qu.:0
## Max. :0.107208 Max. :0.071271 Max. :0.0389603 Max. :0
##
## pipeline recreation reservoir residence_clearing
## Min. :0.00000 Min. :0 Min. :0.000e+00 Min. :0
## 1st Qu.:0.00000 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0
## Median :0.02243 Median :0 Median :0.000e+00 Median :0
## Mean :0.02699 Mean :0 Mean :2.865e-05 Mean :0
## 3rd Qu.:0.03776 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0
## Max. :0.12204 Max. :0 Max. :4.441e-03 Max. :0
##
## ris_airp_runway ris_borrowpits ris_camp_industrial ris_clearing_unknown
## Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0
##
## ris_drainage ris_facility_operations ris_facility_unknown ris_mines_oilsands
## Min. :0 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0
## Median :0 Median :0.0000000 Median :0 Median :0
## Mean :0 Mean :0.0003528 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0.0546781 Max. :0 Max. :0
##
## ris_oilsands_rms ris_overburden_dump ris_plant ris_reclaim_ready
## Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0
##
## ris_reclaimed_certified ris_reclaimed_permanent ris_reclaimed_temp
## Min. :0 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0.000000
## Median :0 Median :0.0000000 Median :0.000000
## Mean :0 Mean :0.0002803 Mean :0.000318
## 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0.000000
## Max. :0 Max. :0.0434483 Max. :0.016762
##
## ris_road ris_soil_replaced ris_soil_salvaged ris_tailing_pond
## Min. :0.0000000 Min. :0 Min. :0 Min. :0.0000000
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0 1st Qu.:0.0000000
## Median :0.0000000 Median :0 Median :0 Median :0.0000000
## Mean :0.0000302 Mean :0 Mean :0 Mean :0.0009116
## 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.0000000
## Max. :0.0046809 Max. :0 Max. :0 Max. :0.1413014
##
## ris_tank_farm ris_transmission_line ris_utilities ris_waste ris_windrow
## Min. :0 Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0 Max. :0
##
## rlwy_dbl_track rlwy_mlt_track rlwy_sgl_track rlwy_spur road_gravel_1l
## Min. :0 Min. :0 Min. :0 Min. :0 Min. :0.000000
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0.000000
## Median :0 Median :0 Median :0 Median :0 Median :0.004254
## Mean :0 Mean :0 Mean :0 Mean :0 Mean :0.004548
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.007252
## Max. :0 Max. :0 Max. :0 Max. :0 Max. :0.022773
##
## road_gravel_2l road_paved_1l road_paved_2l road_paved_3l road_paved_4l
## Min. :0.000000 Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0.000000 Median :0 Median :0 Median :0 Median :0
## Mean :0.001748 Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0.000000 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0.015867 Max. :0 Max. :0 Max. :0 Max. :0
##
## road_paved_5l road_paved_div road_paved_undiv_1l road_paved_undiv_2l
## Min. :0 Min. :0 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0 Median :0 Median :0.0000000 Median :0.0000000
## Mean :0 Mean :0 Mean :0.0001162 Mean :0.0005722
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0.0000000
## Max. :0 Max. :0 Max. :0.0085401 Max. :0.0118399
##
## road_unclassified road_unimproved road_unpaved_1l road_unpaved_2l
## Min. :0.00e+00 Min. :0.000000 Min. :0 Min. :0
## 1st Qu.:0.00e+00 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0
## Median :0.00e+00 Median :0.000000 Median :0 Median :0
## Mean :2.20e-06 Mean :0.001069 Mean :0 Mean :0
## 3rd Qu.:0.00e+00 3rd Qu.:0.001017 3rd Qu.:0 3rd Qu.:0
## Max. :3.41e-04 Max. :0.010709 Max. :0 Max. :0
##
## road_winter rough_pasture runway rural_residence
## Min. :0 Min. :0.0000000 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0 Median :0.0000000 Median :0.000e+00 Median :0.000e+00
## Mean :0 Mean :0.0001776 Mean :9.358e-05 Mean :5.795e-06
## 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00
## Max. :0 Max. :0.0149983 Max. :1.451e-02 Max. :8.982e-04
##
## sump surrounding_veg tailing_pond tame_pasture
## Min. :0.000000 Min. :0 Min. :0 Min. :0.000e+00
## 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0 1st Qu.:0.000e+00
## Median :0.000000 Median :0 Median :0 Median :0.000e+00
## Mean :0.003364 Mean :0 Mean :0 Mean :4.727e-06
## 3rd Qu.:0.002012 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.000e+00
## Max. :0.033997 Max. :0 Max. :0 Max. :7.326e-04
##
## trail transfer_station transmission_line truck_trail
## Min. :0.0000000 Min. :0 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.0000000
## Median :0.0001657 Median :0 Median :0.000000 Median :0.0000000
## Mean :0.0009478 Mean :0 Mean :0.007669 Mean :0.0008284
## 3rd Qu.:0.0015165 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.:0.0000000
## Max. :0.0068343 Max. :0 Max. :0.070051 Max. :0.0149490
##
## urban_industrial urban_residence vegetated_edge_railways
## Min. :0.000000 Min. :0 Min. :0
## 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0
## Median :0.000000 Median :0 Median :0
## Mean :0.002782 Mean :0 Mean :0
## 3rd Qu.:0.000000 3rd Qu.:0 3rd Qu.:0
## Max. :0.215891 Max. :0 Max. :0
##
## vegetated_edge_roads well_aband well_bitumen well_cased
## Min. :0.00000 Min. :0.000000 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.00379 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.0000000
## Median :0.01016 Median :0.001888 Median :0.000000 Median :0.0000000
## Mean :0.01569 Mean :0.004932 Mean :0.009243 Mean :0.0001615
## 3rd Qu.:0.02866 3rd Qu.:0.007000 3rd Qu.:0.012968 3rd Qu.:0.0000000
## Max. :0.06275 Max. :0.042874 Max. :0.083850 Max. :0.0071111
##
## well_cleared_not_confirmed well_cleared_not_drilled well_gas
## Min. :0.0000000 Min. :0 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.000e+00
## Median :0.0000000 Median :0 Median :0.000e+00
## Mean :0.0006285 Mean :0 Mean :8.574e-05
## 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0.000e+00
## Max. :0.0365581 Max. :0 Max. :2.579e-03
##
## well_oil well_other well_unknown
## Min. :0 Min. :0.000000 Min. :0
## 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0
## Median :0 Median :0.000000 Median :0
## Mean :0 Mean :0.001517 Mean :0
## 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.:0
## Max. :0 Max. :0.030134 Max. :0
##
Let’s also plot histograms of each variable for data visualization in a for loop, I wanted to do this for just one buffer size to reduce replicates but it will also drop any variables for which all the data are zeros, so you could explore this at different buffer widths or just remove the filter function and look at all the data which is what I do below once it is grouped
# filter to just one buffer width
covariates_1000 <- covariates %>%
filter(buff_dist == 1000)
for (col in 1:ncol(covariates_1000)) {
hist(covariates_1000[,col])
}
Now we can use the information from the previous few steps as well as the variable descriptions from the ABMI human footprints wall to wall data download website for Year 2021 which is stored in the ‘relevant literature’ portion of this document AND also copied into the README file, to group the covariates so we reduce the number of potential variables to explore in the modeling phase.
We will use the mutate() function with some tidyverse
trickery (i.e., nesting across() and
contains() in rowsums()) to sum across each
observation (row) by searching for various character strings. If there
isn’t a common character string for multiple variables we want to sum
then we provide each one individually. We can also combine these methods
(e.g., with ‘facilities’ [see code]).
covariates_grouped <- covariates %>%
# rename 'vegetated_edge_roads so that we can use road as keyword to group roads without including this feature
rename('vegetated_edge_rds' = vegetated_edge_roads) %>%
# within the mutate function create new column names for the grouped variables
mutate(
# borrowpits
borrowpits = rowSums(across(contains('borrowpit'))) + # here we use rowsums with across() and contains() to sum acrross each row any values for columns that contain the keyword above. Be careful when using that there aren't any variables that match the string (keyword) provided that you don't want to include!
dugout +
lagoon +
sump,
# clearings
clearings = rowSums(across(contains('clearing'))) +
runway,
# cultivations
cultivation = crop +
cultivation_abandoned +
fruit_vegetables +
rough_pasture +
tame_pasture,
# harvest areas
harvest = rowSums(across(contains('harvest'))),
# industrial facilities
facilities = rowSums(across(contains('facility'))) +
rowSums(across(contains('plant'))) +
camp_industrial +
mill +
ris_camp_industrial +
ris_tank_farm +
ris_utilities +
urban_industrial,
# mine areas
mines = rowSums(across(contains('mine'))) +
rowSums(across(contains('tailing'))) +
grvl_sand_pit +
peat +
ris_drainage +
ris_oilsands_rms +
ris_overburden_dump +
ris_reclaim_ready +
ris_soil_salvaged +
ris_waste,
# railways
railways = rowSums(across(contains('rlwy'))),
# reclaimed areas
reclaimed = rowSums(across(contains('reclaimed'))) +
ris_soil_replaced +
ris_windrow,
# recreation areas
recreation = campground +
golfcourse +
greenspace +
recreation,
# residential areas (can't use residence as keyword because 'residence_clearing' is in clearing unless we rearrange groupings or rename that one)
residential = country_residence +
rural_residence +
urban_residence,
# roads (we renamed 'vegetated_edge_roads' above to 'vegetated_edge_rds' so we can use roads as keyword here which saves a bunch of coding as there are many many road variables)
roads = rowSums(across(contains('road'))) +
interchange_ramp +
airp_runway +
ris_airp_runway +
transfer_station,
# seismic lines
seismic_lines = conventional_seismic,
# 3D sesimic lines
seismic_lines_3D = low_impact_seismic,
# transmission lines
transmission_lines = rowSums(across(contains('transmission'))),
# trails
trails = rowSums(across(contains('trail'))),
# vegetated edges
veg_edges = rowSums(across(contains('vegetated'))) +
surrounding_veg,
# man-made water features
water = canal +
reservoir,
# well sites (this probably includes 'clearing_wellpad' need to check)
wells = rowSums(across(contains('well'))),
# remove columns that were used to create new columns to tidy the data frame
.keep = 'unused') %>%
# reorder variables so the veg data is after all the HFI data
relocate(starts_with('lc_class'),
.after = wells)
# see what's left
names(covariates_grouped)
## [1] "array" "site" "camera"
## [4] "buff_dist" "borrowpits" "cfo"
## [7] "landfill" "pipeline" "recreation"
## [10] "clearings" "cultivation" "harvest"
## [13] "facilities" "mines" "railways"
## [16] "reclaimed" "residential" "roads"
## [19] "seismic_lines" "seismic_lines_3D" "transmission_lines"
## [22] "trails" "veg_edges" "water"
## [25] "wells" "lc_class110" "lc_class120"
## [28] "lc_class20" "lc_class210" "lc_class220"
## [31] "lc_class230" "lc_class32" "lc_class33"
## [34] "lc_class34" "lc_class50"
# check the structure of new data
str(covariates_grouped)
## tibble [3,100 Ă— 35] (S3: tbl_df/tbl/data.frame)
## $ array : Factor w/ 4 levels "LU13","LU15",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ site : Factor w/ 155 levels "LU13_18","LU13_15",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ camera : Factor w/ 96 levels "18","15","03",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ buff_dist : Factor w/ 20 levels "250","500","750",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ borrowpits : num [1:3100] 0 0 0 0 0 ...
## $ cfo : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ landfill : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ pipeline : num [1:3100] 0 0.068 0 0 0.0301 ...
## $ recreation : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ clearings : num [1:3100] 0.0923 0.0697 0 0 0 ...
## $ cultivation : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ harvest : num [1:3100] 0 0 0.687 0.337 0 ...
## $ facilities : num [1:3100] 0.291 0 0 0 0 ...
## $ mines : num [1:3100] 0 0.0873 0 0 0 ...
## $ railways : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ reclaimed : num [1:3100] 0 0.0477 0 0 0 ...
## $ residential : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ roads : num [1:3100] 0 0.0174 0 0 0 ...
## $ seismic_lines : num [1:3100] 0 0.03276 0 0.00889 0.01145 ...
## $ seismic_lines_3D : num [1:3100] 0 0 0 0 0.0523 ...
## $ transmission_lines: num [1:3100] 0.0642 0 0 0 0.091 ...
## $ trails : num [1:3100] 0.00588 0.0028 0 0.01591 0 ...
## $ veg_edges : num [1:3100] 0 0.0858 0 0 0 ...
## $ water : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ wells : num [1:3100] 0 0 0 0 0.0322 ...
## $ lc_class110 : num [1:3100] 0.193 0.348 0 0 0.178 ...
## $ lc_class120 : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_class20 : num [1:3100] 0.0361 0 0 0 0 ...
## $ lc_class210 : num [1:3100] 0.456 0.358 0.186 1 0.822 ...
## $ lc_class220 : num [1:3100] 0 0 0 0 0 ...
## $ lc_class230 : num [1:3100] 0 0.101 0.255 0 0 ...
## $ lc_class32 : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_class33 : num [1:3100] 0 0.101 0 0 0 ...
## $ lc_class34 : num [1:3100] 0 0.0916 0 0 0 ...
## $ lc_class50 : num [1:3100] 0.316 0 0.559 0 0 ...
# check summary of new data
summary(covariates_grouped)
## array site camera buff_dist borrowpits
## LU13:820 LU13_18: 20 27 : 80 250 : 155 Min. :0.000000
## LU15:780 LU13_15: 20 32 : 80 500 : 155 1st Qu.:0.000000
## LU21:720 LU13_03: 20 41 : 80 750 : 155 Median :0.001649
## LU01:780 LU13_34: 20 36 : 80 1000 : 155 Mean :0.004302
## LU13_57: 20 16 : 60 1250 : 155 3rd Qu.:0.004453
## LU13_16: 20 21 : 60 1500 : 155 Max. :0.310957
## (Other):2980 (Other):2660 (Other):2170
## cfo landfill pipeline recreation
## Min. :0.000e+00 Min. :0 Min. :0.00000 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0.00000 1st Qu.:0.000e+00
## Median :0.000e+00 Median :0 Median :0.01350 Median :0.000e+00
## Mean :8.077e-07 Mean :0 Mean :0.01937 Mean :4.904e-05
## 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0.02812 3rd Qu.:0.000e+00
## Max. :1.215e-03 Max. :0 Max. :0.28897 Max. :1.337e-02
##
## clearings cultivation harvest facilities
## Min. :0.0000000 Min. :0.0000000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.0005278 Median :0.0000000 Median :0.00000 Median :0.000000
## Mean :0.0060419 Mean :0.0009397 Mean :0.01868 Mean :0.006653
## 3rd Qu.:0.0040539 3rd Qu.:0.0000000 3rd Qu.:0.01348 3rd Qu.:0.002769
## Max. :0.4024400 Max. :0.1253361 Max. :0.83674 Max. :0.335753
##
## mines railways reclaimed residential
## Min. :0.000000 Min. :0 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.0000000
## Median :0.000000 Median :0 Median :0.000000 Median :0.0000000
## Mean :0.005448 Mean :0 Mean :0.001002 Mean :0.0001473
## 3rd Qu.:0.000000 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.:0.0000000
## Max. :0.557884 Max. :0 Max. :0.078321 Max. :0.0180541
##
## roads seismic_lines seismic_lines_3D transmission_lines
## Min. :0.000000 Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.001040 1st Qu.:0.002686 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.004019 Median :0.006602 Median :0.000000 Median :0.000000
## Mean :0.006218 Mean :0.006732 Mean :0.004302 Mean :0.005597
## 3rd Qu.:0.008650 3rd Qu.:0.009985 3rd Qu.:0.001360 3rd Qu.:0.007232
## Max. :0.071829 Max. :0.045536 Max. :0.087550 Max. :0.173909
##
## trails veg_edges water wells
## Min. :0.000e+00 Min. :0.000000 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:9.465e-05 1st Qu.:0.001437 1st Qu.:0.000e+00 1st Qu.:0.0008692
## Median :7.187e-04 Median :0.006425 Median :0.000e+00 Median :0.0068416
## Mean :1.516e-03 Mean :0.011335 Mean :1.254e-05 Mean :0.0143883
## 3rd Qu.:1.958e-03 3rd Qu.:0.015562 3rd Qu.:0.000e+00 3rd Qu.:0.0167246
## Max. :3.864e-02 Max. :0.147895 Max. :7.896e-03 Max. :0.3045854
##
## lc_class110 lc_class120 lc_class20 lc_class210
## Min. :0.00000 Min. :0.0000000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.01970 1st Qu.:0.0000000 1st Qu.:0.00000 1st Qu.:0.4607
## Median :0.03874 Median :0.0000000 Median :0.00000 Median :0.6749
## Mean :0.04838 Mean :0.0007554 Mean :0.02741 Mean :0.6324
## 3rd Qu.:0.06218 3rd Qu.:0.0000000 3rd Qu.:0.03361 3rd Qu.:0.8364
## Max. :0.73192 Max. :0.1211446 Max. :0.51965 Max. :1.0000
##
## lc_class220 lc_class230 lc_class32 lc_class33
## Min. :0.000000 Min. :0.00000 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.002332 1st Qu.:0.01218 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0.044977 Median :0.03595 Median :0.000e+00 Median :0.0000000
## Mean :0.113317 Mean :0.06341 Mean :1.748e-05 Mean :0.0046114
## 3rd Qu.:0.154669 3rd Qu.:0.08419 3rd Qu.:0.000e+00 3rd Qu.:0.0005702
## Max. :0.971773 Max. :0.72101 Max. :1.175e-02 Max. :0.3242328
##
## lc_class34 lc_class50
## Min. :0.000000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.02385
## Median :0.004043 Median :0.05717
## Mean :0.030311 Mean :0.07933
## 3rd Qu.:0.038149 3rd Qu.:0.11545
## Max. :0.557178 Max. :0.60824
##
# there are some NAs in the data which will cause problems with modeling/visualization of data ignore for now but will explore these sites specifically after report
covariates_grouped <- covariates_grouped %>%
# remove rows with NAs
na.omit()
Let’s look at the histograms again and see if we need to remove any features or feature groups without enough data
# use for loop to plot histograms for all covariates
for (col in 5:ncol(covariates_grouped)) {
hist(covariates_grouped[,col])
}
> IMO we don’t have enough variation in data to use the following
features/feature groups
We also don’t have any data for following features since they don’t
plot with the hist() function
Also, there’s not a lot of data for the following features, which are similar and of interest to OSM, so in the past they’ve been grouped together and we will here as well
So let’s modify this data and remove those features for now this step will need to be changed each year likely
Let’s also rename the landcover classes so they make more sense without having to look them up by number (maybe should add this to script earlier for next year)
covariates_grouped <- covariates_grouped %>%
# create column osm_industrial
mutate(
osm_industrial = borrowpits +
clearings +
facilities +
mines,
# remove columns we used to make this variable
.keep = 'unused') %>%
# remove other features we don't need
select(!c(cfo,
cultivation,
reclaimed,
recreation,
residential,
water,
lc_class20,
lc_class120,
lc_class32,
lc_class33,
landfill,
railways)) %>%
# rename landcover classes
rename(
grassland = lc_class110,
coniferous = lc_class210,
broadleaf = lc_class220,
mixed = lc_class230,
developed = lc_class34,
shrub = lc_class50)
# check that it worked
names(covariates_grouped)
## [1] "array" "site" "camera"
## [4] "buff_dist" "pipeline" "harvest"
## [7] "roads" "seismic_lines" "seismic_lines_3D"
## [10] "transmission_lines" "trails" "veg_edges"
## [13] "wells" "grassland" "coniferous"
## [16] "broadleaf" "mixed" "developed"
## [19] "shrub" "osm_industrial"
We need to subset the data so we have separate data frames for each buffer width to work with in the analysis AND to explore correlation between variables at each buffer width, as these may very with spatial scales
Let’s use a for loop to subset the data
buffer_frames <- list()
for (i in unique(covariates_grouped$buff_dist)){
print(i)
# Subset data based on radius
df <- covariates_grouped %>%
filter(buff_dist == i)
# list of dataframes
buffer_frames <-c (buffer_frames, list(df))
}
## [1] "250"
## [1] "500"
## [1] "750"
## [1] "1000"
## [1] "1250"
## [1] "1500"
## [1] "1750"
## [1] "2000"
## [1] "2250"
## [1] "2500"
## [1] "2750"
## [1] "3000"
## [1] "3250"
## [1] "3500"
## [1] "3750"
## [1] "4000"
## [1] "4250"
## [1] "4500"
## [1] "4750"
## [1] "5000"
# name list objects so we can extract names for plotting
buffer_frames <- buffer_frames %>%
# absurdly long way to do this but for sake of time fuck it
purrr::set_names('250 meter buffer',
'500 meter buffer',
'750 meter buffer',
'1000 meter buffer',
'1250 meter buffer',
'1500 meter buffer',
'1750 meter buffer',
'2000 meter buffer',
'2250 meter buffer',
'2500 meter buffer',
'2750 meter buffer',
'3000 meter buffer',
'3250 meter buffer',
'3500 meter buffer',
'3750 meter buffer',
'4000 meter buffer',
'4250 meter buffer',
'4500 meter buffer',
'4750 meter buffer',
'5000 meter buffer')
Now we have a list with data frames for each buffer width which we can work with later.
Now we need to make correlation plots for each buffer width to see
what variables are correlated at a given spatial scale. We can use
purrr::map() with the chart.Correlation()
function from the PerformanceAnalytics package to make
correlation plots with a specified method (e.g., pearson, spearman,
etc.) That also show histograms and scatterplots of each variable.
correlation_plots <- buffer_frames %>%
purrr::map(
~.x %>%
# select numeric variables only since we can't compute a r2 for non-numeric
select_if(is.numeric) %>%
# use chart.correlation in
chart.Correlation(.,
histogram = TRUE,
method = "pearson")
)